MUSCLE: multi-view and multi-scale attentional feature fusion for microRNA–disease associations prediction

Abstract MicroRNAs (miRNAs) synergize with various biomolecules in human cells resulting in diverse functions in regulating a wide range of biological processes. Predicting potential disease-associated miRNAs as valuable biomarkers contributes to the treatment of human diseases. However, few previous methods take a holistic perspective and only concentrate on isolated miRNA and disease objects, thereby ignoring that human cells are responsible for multiple relationships. In this work, we first constructed a multi-view graph based on the relationships between miRNAs and various biomolecules, and then utilized graph attention neural network to learn the graph topology features of miRNAs and diseases for each view. Next, we added an attention mechanism again, and developed a multi-scale feature fusion module, aiming to determine the optimal fusion results for the multi-view topology features of miRNAs and diseases. In addition, the prior attribute knowledge of miRNAs and diseases was simultaneously added to achieve better prediction results and solve the cold start problem. Finally, the learned miRNA and disease representations were then concatenated and fed into a multi-layer perceptron for end-to-end training and predicting potential miRNA–disease associations. To assess the efficacy of our model (called MUSCLE), we performed 5- and 10-fold cross-validation (CV), which got average the Area under ROC curves of 0.966\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} ${\pm }$\end{document}0.0102 and 0.973\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} ${\pm }$\end{document}0.0135, respectively, outperforming most current state-of-the-art models. We then examined the impact of crucial parameters on prediction performance and performed ablation experiments on the feature combination and model architecture. Furthermore, the case studies about colon cancer, lung cancer and breast cancer also fully demonstrate the good inductive capability of MUSCLE. Our data and code are free available at a public GitHub repository: https://github.com/zht-code/MUSCLE.git.


INTRODUCTION
A group of non-coding RNAs, miRNAs or microRNAs are about 22 nucleotides long and control gene transcription by targeting messenger RNA for translational inhibition or degradation [1].In recent years, numerous studies have elucidated the strong correlation between miRNAs and the occurrence of diverse human diseases [2].It has been shown by clinical studies that miR-145 and miR-218 are associated with the prognosis of patients with laryngeal cancer [3].Additionally, the aberrant expression of miR-107 can result in abnormal activity of BACE1 (beta-secretase 1) and contribute to the pathogenesis of Alzheimer's disease [4].Studies have demonstrated that miRNAs play vital roles in various biological processes, including cell apoptosis, differentiation and development.The growing attention towards their relationships with complex diseases further underscores their importance [5].Traditional experimental methods often require extensive laboratory capabilities and analyses, which are accurate, but usually inefficient.Moreover, they are time-consuming and costly, as they involve multiple steps of sample preparation, processing and measurement.Therefore, there is a need for alternative computational methods that can provide fast, accurate and scalable results for predicting the miRNA-disease associations (MDAs) [6,7].
The prediction of potential MDAs has gained momentum due to the exponential advancement in artificial intelligence.The assumption that miRNAs with similar functions have higher chances of being associated with diseases of similar phenotypes is the basis for developing most of these approaches.For instance, Ning et al. [8] presented a computational approach, called AMHMDA, for predicting MDAs.They utilized a fusion attention mechanism and hypernodes to enhance information extraction and prediction accuracy.Experimental results on the HMDD v3.2 demonstrated AMHMDA's superior performance compared to other methods, validating its robust predictive capability.Li et al. [9] introduced HGANMDA, which utilized node-and semanticlayer attentions.They also adopted the similarity assumption and used a bilinear decoder to reconstruct the miRNA-disease connections.
In addition, computational approaches based on matrix decomposition techniques that commonly used to predict MDAs were also developing rapidly.For instance, Chen et al. [10] introduced the neighborhood constraint matrix completion method called NCMCMDA, for MDAs prediction.NCMCMDA incorporated neighborhood constraint and matrix completion technology and effectively utilized similarity information to aid prediction, demonstrating superior performance compared to previous computational methods.Chen et al. [11] introduced MDHGI, a computational model of matrix decomposition.Integrating various similarity measures and a sparse learning method, MDHGI effectively utilized matrix decomposition before constructing the heterogeneous network, significantly enhancing prediction accuracy.Ha et al. [12] introduced a computational framework called SMAP, which also utilized a matrix factorization model and integrated comprehensive similarity measurements for identifying MDAs.SMAP incorporated similarity constraints and demonstrated strong AUCs, underscoring the effectiveness of the matrix factorization approach.
To date, despite the variety of computational methods that have been used to predict MDAs, it is important to note that many of them do not take a holistic perspective and only concentrate on isolated miRNA and disease objects, thereby ignoring that human cells are responsible for multiple relationships.Moreover, the efficient fusion method of multi-source features and the incorporation of prior domain knowledge are very important advantages to the model's prediction ability and application scenarios.On this basis, we proposed MUSCLE for predicting potential associations between miRNAs and diseases.The architecture diagram corresponding to MUSCLE was displayed in Figure 1.Specifically, we first combined multi-source information to construct a multi-view graph, including miRNA-drug-disease graph, miRNAmessenger RNA (mRNA)-disease graph and miRNA-long noncoding RNA (lncRNA)-disease graph.Then, the graph attention neural network was utilized to learn the graph topology features for each view.Next, a multi-scale feature fusion module was designed for efficiently fusing these topology features.In addition, the prior attribute knowledge of miRNAs and diseases was simultaneously added to achieve better prediction results and solve the cold start problem.Finally, the learned representations were concatenated and put into a multi-layer perceptron (MLP) for end-toend training and predicting.Specifically, the main contributions of MUSCLE are summarized: (i) we utilize the relationship between miRNAs and various biomolecules to built a multi-view graph, and utilize graph attention network to capture graph topology features for each view.(ii) Based on the motivation that a more efficient feature fusion strategy can improve the predictive ability of the model, we design a multi-scale feature fusion module for efficiently fusing multiple topology features by incorporating the local context into the global context within the attention module.(iii) The prior attribute knowledge of miRNAs and diseases was further added to achieve better prediction ability and solve the cold start problem.(iv) Our method shows excellent predictive performance.Each module is tuned to the optimal, and the case studies fully prove the powerful inductive ability of MUSCLE.

Data sources
We obtained MDAs from the Human MicroRNA Disease Database (HMDD v3.2) [13], the latest comprehensive repository for human MDAs, encompassing a broader spectrum of experimentally supported associations.Finally, 12,446 experimentally confirmed MDAs including 901 miRNAs and 877 diseases were selected in this study.We also obtained 269 drug-miRNA pairs and 17,414 drug-disease pairs from the DrugBank database [14] to construct the heterogeneous miRNA-drug-disease graph.From the US National Library of Medicine database, we also obtained 5186 pairs of miRNA-mRNA pairs and 8958 mRNA-disease pairs to construct the heterogeneous miRNA-mRNA-disease associations.Furthermore, we also obtained 8634 lncRNA-miRNA pairs and 874 lncRNA-disease pairs to construct the heterogeneous miRNA-lncRNA-disease graph from the NONCODEV5 database [15].Finally, an equal number of randomly selected non-MDAs were used as a negative control.

Sequence-based attribute feature for miRNAs
To capture a comprehensive representation of miRNA features, we incorporated miRNA sequence information into our analysis.Specifically, we retrieved all sequences from the miRBase database [16] and transformed them into vectors using the k-mers method, which divided sequences into subsequences of length k, resulting in m − k + 1 k-mers for a sequence of length m.In this study, we extracted adjacent 3-mers from the miRNA sequences.Since miRNAs contain four nucleotides (A, C, G and U), we can split the miRNA sequence into 64 possible combinations, such as AAA, AAC, . .., UUU.We accomplished this by sliding a window along the miRNA sequence and computing the frequency of each subsequence.Subsequently, we normalized these frequency values to create a 64-dimensional vector that represented the miRNA sequence information, enabling the capture of miRNA attributes.

Semantic similarity-based attribute feature for diseases
The Medical Subject Heading (MeSH) [17] offers a comprehensive system for disease classification.Building on this, the connections between different diseases are depicted by a directed acyclic graph (DAG).In this graph, nodes symbolize the MeSH descriptors of the diseases, while the directed edges link from broader entities to more detailed ones.For example, a disease A is denoted as , where D(A) represents A and its ancestor nodes, E(A) denotes all the direct edges.Next, we defined the semantic contribution of disease term t in DAG(A): In this case, is the semantic contribution's decay factor, which reduces the impact of disease t when it differs from A. Also, disease A keeps a semantic value contribution of 1 by itself.
The semantic value of disease A was as follows: Thus, the disease semantic similarity (DSS1) between diseases d i and d i is first determined through common nodes shared in the two DAGs, calculated as follows: In addition, we further differentiated the contribution of diseases, as some of them appeared more frequently in other DAGs.Specifically, diseases appeared more often in other DAGs should contribute less compared to those appeared in fewer DAGs.The semantic value of disease A is inf luenced by disease t in the following way: Thus, the disease semantic similarity (DSS2) between diseases d i and d i can also be determined by the common nodes shared in the two DAGs in the following way: Finally, the sum of two semantic similarity model is adopted as the attribute feature of diseases to achieve better prediction performance and solve the cold start problem.

Gaussian interaction profile kernel similarity-based attribute feature for miRNAs and diseases
We also computed the Gaussian interaction profile (GIP) kernel similarity for miRNAs and diseases, based on the hypothesis that miRNAs with similar functions tend to be related to similar diseases and the other way around.We first represented the association between each miRNA and disease d(i) as a binary vector G(d(i)) for diseases.The GIP kernel similarity between diseases d(i) and d(j) (KD(d(i), d(j))) was defined as follows: where parameter γ d was the bandwidth of the kernel.It was obtained by normalizing the original parameter γ d : We use the same method to calculate the GIP kernel similarity for miRNAs as we do for diseases, which is as follows:

Integrated attribute features for miRNAs and diseases
To provide a more comprehensive depiction of the attribute features of miRNAs, we concatenated the sequence-based feature (SM) and GIP kernel similarity-based feature (KM) to designed an integrated miRNA attribute feature matrix DM.Specifically, the DM was obtained in the following way: Furthermore, an integrated disease attribute feature matrix DD was also designed based on the semantic similarity 1 (DSS1), the semantic similarity 2 (DSS2) and the GIP kernel similarity of diseases (KD).Specifically, the DD was obtained in the following way: , d(i) and d(j) have semantic similarity

Multi-view heterogeneous graphs construction
To excavate the potential associations between miRNAs and diseases that might have been overlooked due to the complexity and heterogeneity of disease pathways, we employed other biomolecules as mediators to construct multi-view graphs.We respectively constructed three heterogeneous graphs, including miRNA-mRNA-disease graph, miRNA-lncRNA-disease graph and miRNA-drug-disease graph.Taking the miRNA-mRNA-disease graph as an example (the remaining two graphs were constructed in the same way), we used different miRNAs, mRNAs and diseases as nodes of the graph, and collected miRNA-mRNA associations and mRNA-disease associations as edges of the graph.Note that MDAs were removed from the graph to prevent label leakage.After that, we represent the miRNA-mRNA-disease graph as an adjacency matrix MMD.If there is an association between two items in the matrix, we set the element at the corresponding position to 1; otherwise, we set it to 0. Then, we constructed the miRNA-lncRNA-disease graph and miRNA-drug-disease graph using the same way based on the miRNA-mRNA, mRNA-disease, lncRNA-disease and miRNA-lncRNA associations.We similarly generated the adjacency matrix MLD for the miRNA-lncRNAdisease graph, and the adjacency matrix MDD for the miRNAdrug-disease graph.

Multi-view graph attention network
To better capture the graph topological features of miRNA nodes and disease nodes in each of the heterogeneous graphs (views) we constructed, we utilized the graph attention neural network for each view.Finally, we fused each miRNA and disease feature in the three views as the final graph structure feature of them with a multi-scale feature fusion module.Taking the miRNA-drugdisease graph as example (the corresponding adjacency matrix MDD), we first randomly initialize the feature representation of each miRNA, drug and disease node in the graph as x = {x 1 , x 2 , x 3 , . . ., x |D| }, x i ∈ R F , where |D| denoted the node number in the graph and F denoted the dimension of each feature vector.The output of each layer was a new set of graph topology features of the nodes as x = {x 1 , x 2 , x 3 , . . ., x |D| }, x i ∈ R F .Generally speaking, the graph attention network first calculated the MUSCLE | 5 attention coefficients of adjacent nodes through the selfattention mechanism: where W ∈ R F×F represents a learnable weight matrix, and x i , x j and x k , respectively, represents the feature vector of node i, j and k.The •, and T respectively represents the multiplication, concatenation and transposition operations, a T ∈ R 2F is the weight parameter of a single-layer feedforward neural network a, LeakyReLU is the nonlinear activation function, exp is the exponential function, N i represents all neighbor nodes of node i and α ij represents the attention coefficient from node i to node j.To enhance the model's fitting ability, we then incorporated the multi-head attention mechanism, that is, utilizing multiple W matrices to calculate various attention coefficients simultaneously.The final feature representation of the node was obtained by concatenating the results calculated by each W matrix as follows: where represents the number of W matrix, α φ ij represents the φth attention coefficient, σ is the activation function and x i represents the final output feature representation of node i.Similarly, we, respectively, utilized the graph attention network to extract graph topological features of miRNAs and diseases on the miRNA-mRNA-disease graph and the miRNA-lncRNA-disease graph, which correspond to the adjacency matrices MMD and MLD.The difference lied in the dimensionality of the adjacency matrix MMD and MLD, i.e., the node number in the miRNA-mRNAdisease and miRNA-lncRNA-disease graph, which are 3929 and 2459, respectively.Furthermore, the learning process of the graph attention network was stopped when the representation of the nodes no longer undergoes large changes, and the final node representations were obtained.Finally, we picked out miRNAs and diseases from each view and concatenated together the output features of the nodes with a dimension of 1778.

Multi-scale feature fusion module
To effectively fuse three kinds of graph structure features of miRNAs and diseases, so as to more accurately characterize them and improve the prediction accuracy of potential associations, we designed a multi-scale attentional feature fusion module as shown in Figure 1D.Specifically, this module aggregated both the local and global feature context of three kinds of graph structure features of miRNAs and diseases.For aggregating the local channel context of these features, we adopted point-wise convolution (called PWConv) method [18], which was a form of convolution that employs a 1×1 kernel and only considered pointwise channel interactions for each spatial position.The local channel context L(X) was calculated by a bottleneck structure as follows: where For the global channel context g(X) ∈ R 3 , the global average pooling (GAP) was utilized as follows: where i and j, respectively, denoted the row num and column number of feature matrix X.With both the global channel context g(X) and the local channel context L(X), the multi-scale attention mechanism provides the refined feature X ∈ R 1778×889 as follows: where ⊕ denoted the broadcasting addition, σ denoted the Sigmoid function and ⊗ denoted the element-wise dot product operation.The final fusion features of miRNAs and diseases were then fed into an MLP for training and prediction with a standard binary cross entropy loss function.

Performance evaluation
We performed the 5-and 10-fold CV on our model to assess its prediction performance and generalization ability.In the CV, we randomly split all the known MDA into 5 or 10 groups, and then used one of the groups as the test data and the rest as the training data.We repeated this process 5 or 10 times.Finally, we calculated the mean of the test results to evaluate the model.Furthermore, we also plotted the receiver operating characteristic curves (ROCs) and precision-recall curves (PRCs) in the 5-and 10fold CV to visualize our prediction results (as shown in Figure 2).The area under the ROCs (AUCs) was an evaluation indicator to measure the binary classification model, indicating that the prediction of the probability that positive examples are ranked in front of negative examples.Similarly, the area under the PRCs (AUPRs) was also used as an evaluation indicator.In addition, We also used the other five indicators, including accuracy, sensitive, specificity, precision and Matthews correlation coefficient (MCC), for performance evaluation of our model (as shown in Tables 1  and 2).The average AUC value of MUSCLE under 5-and 10fold CV can, respectively, reached 0.9666 and 0.9737, and the standard deviation is only 0.0102 and 0.0135.Furthermore, the average AUPR value of MUSCLE also reached 0.9649 and 0.9725, respectively.All these indicators and visualization proved the excellent performance and robustness of MUSCLE for predicting potential MDAs.

Parameter analysis
To achieve optimal results of classification, we performed a parameter analysis of the MUSCLE method, focusing on two crucial parameters: the embedding dimensions generated by the graph attention network and the number of layers of the MLP.To ensure fairness, we changed only one parameter at a time and  kept the other parameters unchanged.Furthermore, to enhance experiment reliability and accuracy for each parameter, the 5-fold CV was conducted.In the following sections, we provided detailed experimental descriptions and results.

Impact of embedding dimensions
We first discussed the impact of the embedding dimensions generated by the multi-view graph attention network.We respectively set the embedding dimensions to (600, 700, 878, 902, 1778), of which 1778 is the sum of the dimensions of attribute features.Table 3 and Figure 3(A) showed the experimental performence for these parameters.It can be seen from the results that as the dimension increases, the predicted AUC of the model continues to increase.When the dimension reaches 1778, the results have been greatly improved.We conjectured that larger feature dimensions provided more information, but the continued increase in feature dimensions would instead increase noise thus causing performance degradation as well as introducing problems such as computational complexity and time overhead.

Impact of MLP Layers
The number of hidden layers of MLP also had a great impact on the prediction result of MUSCLE.We adjusted the hidden layer  number from 2 to 5 while maintaining other parameters unchanged.Table 4 and Figure 3(B) showed the experimental results obtained for these parameters.As the number of hidden layers increased, the model's performance improved.However, as the number continues to increase, the performance decreases due to over-fitting.Based on this, we set the number of MLP to 4.

Ablation experiments
We integrated the biological attribute and three topological features to represent miRNA and disease nodes.We conducted ablation experiments to examine the performance of features in this section.Furthermore, we also examined the validity of our fusion module through the ablation experiments.Similar to the previous experiments, we adopted a control variable method and used the average result for 5-fold CV as the final evaluation metric.

Ablation experiment for different features
For the convenience of expression, we defined the attribute features of miRNAs and diseases as V attr , and topological features with drugs, mRNAs, and lncRNAs as intermediate nodes as V drug , V mRNA and V lncRNA .Next, we respectively conducted experiments using different combinations of features and used + to indicate that the corresponding features were considered simultaneously.Table 5 and Table 4(A) showed the performance of MUSCLE with different feature combinations.Finally, the feature combination strategy taken by MUSCLE leads to optimal performance.These heterogeneous topological features enable the model to have a stronger classification performance for potential MDAs.

Ablation experiment for different feature fusion strategies
To examine the validity of the multi-scale attentional feature fusion module, we conducted ablation experiments using five  different feature fusion strategies, including an average valuebased feature fusion strategy ( F ave ), a dot product-based feature fusion strategy (F dot ), a graph convolutional neural network (GCN)based feature fusion strategy (F gcn ), a strategy for removing the global features in the multi-scale attentional feature fusion module (F lf ) and a direct concatenate feature fusion strategy (F cat ).Table 6 and Figure 4(A) showed the prediction performance of different feature fusion strategies, where our multi-scale attentional feature fusion module (MUSCLE) leads to optimal performance.This appropriate feature fusion strategy enables the model to have a stronger classification performance for potential MDAs.

Comparison with one single heterogenous graph strategy
In this work, we respectively constructed three heterogeneous graphs about miRNAs and diseases to extract the relationships between them and other biomolecules from different perspectives.The other option to integrate these relationships is to construct one single heterogeneous graph (SHG) that contains all the different biomolecules.To compare the performance of the two strategies, we reconstructed one SHG with all five different types of biomolecules (miRNAs, drugs, mRNAs, lncRNAs and diseases) and their associations, which integrates three different heterogenous graphs.Note that the SHG generated only one kind of graph structural feature for miRNAs and diseases, so our multi-scale feature fusion module was not available and was discarded in this strategy.Other than that, we kept all the other conditions exactly the same as in our method.Specifically, we first applied the graph attention network to the SHG.We picked out the embedded feature of miRNAs and diseases when the graph attention network converged.To be consistent with our method, we also integrated the same attribute features of miRNAs and diseases into the embeddings from the graph attention network.Finally, the miRNA and disease features were directly fed into the MLP for a 5-fold CV experiment.The training parameters were kept consistent with our method, including the number of iterations: 200, the number of MLP layers: 3, the random number seed: 123, etc. Table 7 shows the comparison results between the SHG strategy and our method with the same evaluation metrics.
In addition, we also plot the comparison of ROC curves and PR curves for the two strategies, as shown in Figure 5. From these results, it can be seen that the prediction performance of the strategy with one SHG is not as good as our method.We speculate that the following factors may account for this phenomenon.First, increasing the complexity of the graph may not always result in better feature representation.It is worth exploring pruning or denoising methods on the graph to potentially improve prediction

Comparison with the state-of-the-art methods
In this section, seven published state-of-the-art (SOTA) methods were selected to compare with our proposed MUSCLE method, including AMHMDA [8], MLRDFM [19], HGSMDA [20], DAEMDA [21], AGAEMD [22], MINIMDA [23] and MAMFGAT [24].These methods adopt different techniques to predict potential MDAs, including heterogeneous network construction, GCN, feature fusion method, attention mechanism, etc.We brief ly introduced the main workf low of these methods in the following list.To be consistent with our method, these baseline methods all meet the following three requirements: (1) Published after 2022.
(2) Using the same training dataset as our method, i.e. the Human MicroRNA Disease Database (HMDD v3.2). ( 3) Using the average result of 5-fold CV method as the evaluation indicator.Table 8 shows the comparison results between our MUSCLE method with these SOTA methods.The experimental evaluation demonstrated that MUSCLE was superior to the recently published methods.It had more promising accuracy and robustness to solve the potential MDA prediction problem.
The main workf low of these methods are shown in the follows: • The AMHMDA method included three main steps for predicting potential associations between miRNAs and diseases.These steps involved constructing multiple similarity networks for miRNAs and diseases, introducing hypernodes to create a heterogeneous hypergraph, and utilizing an attention mechanism to combine the outputs of a graph convolutional network for prediction.• The MLRDFM method expands upon the DeepFM architecture by improving it in two main ways.Firstly, it includes item relationships by controlling their embedding features through similarity-based Laplacians.Secondly, it utilizes Laplacian eigenmaps to set the weights in the dense embedding layer, leading to more effective model training.• The HGSMDA method extends upon the HyperGCN model and incorporates the Sφrensen-Dice loss function.It begins by generating networks that capture the similarity between miRNAs and diseases, utilizing GCNs to extract a wide range of information.Subsequently, it forms a miRNAdisease heteromorphic hypergraph with HyperGCN and assesses the accuracy of predicted associations by comparing them to ground truth values using the Sφrensen-Dice loss function.• The DAEMDA method enhances the efficacy of current models by creating networks that capture the similarity between miRNAs and diseases, including both similarity networks and heterogeneous networks.Leveraging graph attention and self-attention-based feature encoders, it extracts information from neighboring nodes and the entire graph.Finally, it combines node embeddings from dual-channel output and employs an MLP to predict associations between miRNAs and diseases.• The AGAEMD method utilizes an encoder-decoder framework to predict potential associations between miRNAs and diseases.In the initial phase, it constructs miRNA functional similarity and feature matrices, drawing from disease semantic similarity and the miRNA-disease adjacency matrix.These matrices are subsequently processed by a deep graph attention network, resulting in informative feature embeddings.Finally, an inner product decoder reconstructs the predictive association matrix.• The MINIMDA method fuses mixed high-order neighborhood information from multimodal networks to predict potential associations between miRNAs and diseases.It constructs integrated miRNA similarity and disease similarity networks using multisource information.The final step involves feeding multimodal embedding representations into a multilayer perceptron to predict underlying associations.• The MAMFGAT method involves three key steps.Firstly, it constructs MDA and integrated networks.Secondly, the embedded representations of miRNA and diseases are obtained by using a two-path graph attention layer, adaptive modality fusion and modality contrastive learning.Finally, it predicts association scores by connecting these representations through an MLP.
Furthermore, we further clarify whether the performance boosts of our method over some baseline algorithms is due to the information increment or the algorithmic improvement, and what is the contribution of each to the performance boosts.In fact, our data framework and algorithms are an integrated whole, and the algorithms developed based on this data framework.Both the information increment and algorithmic improvement contribute to the performance boosts of our method.First, the increment of information provides more ways to characterize miRNAs and diseases, thereby helping the predictive model make more accurate decisions.Second, the improvement of algorithm makes the characterization more rational, and can benefit the predictive model more from multiple perspectives on the characterization.We have separately compared the contribution of information increment and algorithmic improvement to our method over the baseline methods.The average AUC under 5fold CV experiment was used as evaluation indicator.First, we control the algorithmic module unchanged and then observe the performance boost of our method over the baseline methods as the information increases (Figure 6A).Second, we control the data module unchanged and then observe the performance boost of our method over the baseline methods as the algorithm improves (Figure 6B).The different colored dotted lines in the Figure represent different baseline methods.The abscissa in Figure 6(A) represents the increase of information, where V a represents only the attribute information of miRNAs and diseases, V a+d represents the addition of drug information on the basis of V a , V a+l represents the addition of lncRNA information on the basis of V a , V a+m represents the addition of mRNA information on the basis of V a , V a+m+l represents the addition of mRNA and lncRNA information on the basis of V a , and V a+m+l+d represents the addition of mRNA, lncRNA and drug information on the basis of V a .The abscissa in Figure 6(B) represents the improvement of algorithm, where F dot represents the simple dot product of multiple features, F lf represents removing the global attention module in the multiscale attentional feature fusion module, F gcn represents the use of graph convolutional network instead of graph attention network, F cat represents the simple concatenation of multiple features and F ms represents the use of multi-scale attentional feature fusion.It is clear from the results that both the information increment and algorithmic improvement provide a contribution to the ability of our method to outperform the baseline methods.In addition, the information increment, especially in the initial phase, is more significant in improving our method relative to algorithmic improvement.This reminds us that in future work, we can pay more attention to the improvement of data information and develop new algorithms based on the new data framework.The two parts work together to improve the predictive performance of the model.

Case study
In order to examine the capacity of MUSCLE in practical applications, we selected three common diseases for case studies, including lung cancer, breast cancer and colon cancer.First, all the known MDAs in our dataset were used for training  After that, MUSCLE predicted the three test dataset for the corresponding diseases and selected the top 50 miRNAs with the highest predicted scores.Finally, we checked the accuracy of the projected miRNAs using the dbDEMC [ 25] and miRCancer [26] databases.Case studies of each of the three diseases are detailed below.
Lung cancer is a highly fatal and prevalent malignancy worldwide [27], with non-small cell lung cancer (NSCLC) being the most common type, making up about 85% of all new lung cancer cases [28].The researchers had found that miRNAs had a strong association with lung cancer [29].For example, microRNA-301b enhanced drug resistance, reduced apoptosis and increased cell proliferation in lung cancer [30].These discoveries not only add to the in-depth study of the development of lung cancer but also present novel targets and therapeutic options for the identification and treatment of lung cancer.Compared with other cancer types, NSCLC is more resistant to chemotherapy, resulting in many patients still facing a poor  9, where 49 of the top 50 miRNAs were confirmed.
Breast cancer is a very frequent malignancy among women, and even without taking gender into account, it remains one of the most prevalent cancers after lung cancer.Even though breast cancer has a relatively good prognosis, it still ranks fifth in cancer mortality.High miRNA expression levels have been discovered in breast cancer in recent years, which are linked to worse prognosis for patients.Researchers have found that miRNAs regulate the expression of EZH2 and ATM genes, promote tumor cell proliferation and invasion [31].The experimental results are shown in Table 10, where 47 of the top 50 miRNAs were confirmed in the database.As an example, hsa-mir-626, a key regulator of tumorigenesis, is expressed at significantly elevated levels in breast cancer.By interacting with miR-573, hsa-mir-626 suppressed the expression of related normal miRNAs and proteins.This implies that hsa-mir-626 could be a possible therapeutic and prognostic target for breast cancer [32].
Colon cancer is a prevalent gastrointestinal tumor worldwide with a lower 5-years survival rate, and especially in China [33].Patients detected early have a very high probability of surviving, and with the delay of detection and the aggravation of cancer severity, patients' survival time would be greatly limited.Therefore, it is very vital to identify colon cancer early and promptly.More and more researches have shown that miRNA is essential for the beginning, progression, and treatment of colon cancer [34].As shown in Table 11, 48 of the top 50 miRNAs were validated.As an example, Schepeler et al. [35] used significance analysis of microarrays (SAM) to detect specific miRNAs, which differentially expressed between colon cancer subtypes and normal mucosa.They discovered that hsa-miR-484 was considerably lower in colon cancer than in normal mucosa.

CONCLUSION
In this work, we proposed a computational method (MUSCLE) to predict potential miRNA-diseases associations.First, MUSCLE took a holistic perspective to built a multi-view graph based on the relationships between miRNAs and various biomolecules, including miRNA-drug-disease, miRNA-mRNA-disease and miRNA-lncRNA-disease association graphs.Then, graph attention network was utilized to acquire the graph topology features for each view.Second, MUSCLE efficiently fused multiple graph topology features.Furthermore, MUSCLE also considered the prior attribute knowledge of miRNAs and diseases simultaneously to achieve better prediction results and solve the cold start problem.Finally, the learned representations were then concatenated and fed into an MLP for end-to-end training and predicting.For evaluating the ability of MUSCLE, we respectively conduct 5-and 10-fold CV experiments.MUSCLE outperformed most current state-of-the-art models.Further ablation experiments are performed to verify the efficacy of our feature combination and fusion strategy.Furthermore, the case studies about colon cancer, lung cancer, and breast cancer also fully proved the good inductive capability of MUSCLE.Furthermore, our method still has great potential improvement generalization.For example, our method can be extended to a multi-classification model, which learns and predicts the multiple MDAs at the same time.Besides, the metabolomics information, single-cell sequencing and spatial transcriptome data can be incorporated into the association with miRNA and some specific diseases.These will be the focus of our future work.

Key Points
• The MUSCLE method utilizes the relationship among miRNAs, complex diseases and various biological molecules to construct the heterogeneous multi-view graph, and utilizes graph attention network to capture graph topology features.• The MUSCLE method designs a multi-scale feature fusion strategy to efficiently fuse multiple graph topology features by incorporating the local context into the global context within the attention module.• The MUSCLE method also considers the prior attribute knowledge of miRNAs and diseases simultaneously to achieve better prediction results and solve the cold start problem.• The MUSCLE method outperforms most of the existing methods in terms of predictive performance.Each module is optimized to the best, and case studies demonstrate its strong inductive capability.

Figure 1 .
Figure 1.The f lowchart of MUSCLE.(A) Data sources and some symbols in this study.(B) The computation and integration for the prior attribute features.(C) Multiple heterogeneous graph construction and multi-view graph attention network for graph topology feature extraction.(D) Multi-scale attentional feature fusion mechanism for efficiently fuse these multiple graph topology features.(E) MLP for training and prediction with attribute and graph topology features.

Figure 2 .
Figure 2. The performance of MUSCLE for 5-and 10-fold CV. (A) The ROC analysis results of MUSCLE for 5-fold CV.An enlarged view of the curves is provided in the lower right corner.(B) The Precision-recall results of MUSCLE for 5-fold CV.An enlarged view of the curves is provided in the lower left corner.(C) The ROCr analysis results of MUSCLE for 10-fold CV. (D) The Precision-recall results of MUSCLE for 10-fold CV.

Figure 3 .
Figure 3.The radar plot for the parameter analysis in MUSCLE.(A) The prediction results of MUSCLE on different embedding dimensions of miRNAs and disease nodes generated by the multi-view graph attention network.(B) The prediction results of MUSCLE on different hidden layer numbers of the MLP.

Figure 4 .
Figure 4.The comparison results of ablation experiments.(A) The comparison results between different feature fusion strategies and MUSCLE.(B) The comparison results between different topological features and MUSCLE.

Figure 5 .
Figure 5. Performance comparison of MUSCLE and one SHG strategy.(A) Comparison of ROC curves of MUSCLE and one SHG strategy.(B) Comparison of PR curves of MUSCLE and one SHG strategy.

Figure 6 .
Figure 6.Comparison of the contribution of information increment and algorithmic improvement to our method over the baseline methods.(A) The contribution of information increment to our method over the baseline methods.(B) The contribution of algorithmic improvement to our method over the baseline methods.
the original feature matrix of miRNAs and diseases obtained by graph attention neural network (1778 and 889, respectively, denoted the number of miRNAs and diseases and the dimension of the graph structure features).

Table 1 :
The 5-fold CV performance of MUSCLE

Table 2 :
The 10-fold CV performance of MUSCLE

Table 3 :
Parameter analysis on different embedding dimensions of miRNA and disease nodes

Table 4 :
Parameter analysis on different hidden layer numbers of the MLP

Table 5 :
Performance comparison of different feature combinations in the ablation experiment

Table 6 :
Prediction performance of different feature fusion strategies in the ablation experiment

Table 7 :
Performance comparison of MUSCLE with one single heterogenous graph strategy

9107 ± 0.0184 0.9148 ± 0.0136 0.9067 ± 0.0243 0.9074 ± 0.0242 0.8214 ± 0.0366 0.9666 ± 0.0102 performance
. Second, the exclusion of multi-scale feature fusion module may result in suboptimal final feature representations, as evidenced by the ablation experiment on the multi-scale feature fusion module in Ablation experiment for different feature fusion strategies section.

Table 8 :
Performance comparison of MUSCLE with the state-of-the-art methods

Table 9 :
The top 50 verified miRNAs associated with Lung Cancer

Table 10 :
The top 50 verified associations associated with Breast Cancer

Table 11 :
The top 50 verified miRNAs associated with Colon Cancer